Variable Selection in Data Mining Project

نویسنده

  • Gilles Godbout
چکیده

Bishop(2) establishes that in practical applications of data mining, the choice of preprocessing of the available data will be one of the most significant factors in determining the performance of the final system. Data mining situations are often caracterized with the availability of a large number of raw input input variables, sometimes in the tens of thousands range, and comparably few training examples. Learning algorithms perform well on domain with a relatively small number of relevant variables. They tend to degrade however in presence of a large number of variables including possibly irrelevant and redundant information. Many approaches have been proposed to address the problems of relevance and space dimensionality reduction of the input variables. They include algorithms for feature extraction, variable and feature selection, and example selection to name a few. This document present the report on a session project for the course IFT6266 Algorithmes d’Apprentissage. The objectives of the project are described in the next section. Section 3 brieffly documents the concepts around variable selection. In section 4, we present a specific problem of data mining and proposed different approaches of variable selection to address the problem of relevance and dimensionality reduction. We document the results of our experimentation in section 5 and our conclusions in section 6. This is a preliminary version. At this stage, it is presented as a plan. It contains several elements that are incomplete and require further research and/or discussions.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bridging the semantic gap for software effort estimation by hierarchical feature selection techniques

Software project management is one of the significant activates in the software development process. Software Development Effort Estimation (SDEE) is a challenging task in the software project management. SDEE is an old activity in computer industry from 1940s and has been reviewed several times. A SDEE model is appropriate if it provides the accuracy and confidence simultaneously before softwa...

متن کامل

A new model for mining method selection based on grey and TODIM methods

One of the most important steps involved in mining operations is to select an appropriate extraction method for mine resources. After choosing the extraction method, it is usually impossible to replace it with another one because it may be so expensive that implementation of the entire project could be economically impossible. Choosing a mining method depends on the geological and geometrical c...

متن کامل

Two DEA Models Employment in IS Project Selection for Iran Ministry of Commerce

Selection of an appropriate set of Information System (IS) projects is a critical business activity which is very helpful to all organizations. In this paper, after describing real IS project selection problem of Iran Ministry of Commerce (MOC), we introduce two Data Envelopment Analysis (DEA) models. Then, we show applicability of introduced models for identifying most efficient IS project fro...

متن کامل

Credit scoring in banks and financial institutions via data mining techniques: A literature review

This paper presents a comprehensive review of the works done, during the 2000–2012, in the application of data mining techniques in Credit scoring. Yet there isn’t any literature in the field of data mining applications in credit scoring. Using a novel research approach, this paper investigates academic and systematic literature review and includes all of the journals in the Science direct onli...

متن کامل

A New Knowledge-Based System for Diagnosis of Breast Cancer by a combination of the Affinity Propagation and Firefly Algorithms

Breast cancer has become a widespread disease around the world in young women. Expert systems, developed by data mining techniques, are valuable tools in diagnosis of breast cancer and can help physicians for decision making process. This paper presents a new hybrid data mining approach to classify two groups of breast cancer patients (malignant and benign). The proposed approach, AP-AMBFA, con...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004